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ABSTRACT 

Three important methods areas of multivariate 
analysis that are not always thought of in terms of latent variable 
constructs, but for which latent variable modeling can be used to 
great advantage, are discussed. These methods are: (1) random 
coefficients describing individual differences in growth; (2) 
unobserved variables corresponding to missing data; and (3) variance 
components describing data from cluster sampling. An educational 
achievement dataset of longitudinal observations on secondary 
mathematics achievement (the National Longitudinal Study of American 
Youth) is described as a motivating example. It is shown that all 
three topics can be simply expressed in terms of latent variable 
modeling that fits into existing and generally available structural 
modeling software. This approach mak.es possible a connection between 
psychometri cians and other methodo logis ts interested in latent 
variable modeling. Interesting extensions of these statistical 
analyses are discussed. One table presents missing data patterns. 
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LATENT VARIABLE MODELING OF GROWTH WITH MISSING DATA 

AND MULTILEVEL DATA 1 
Bengt Muthen, CRESST/University of California, Los Angeles 

1. Introduction 

The aim of this paper is to describe three important methods areas of 
multivariate analysis that are not always thought of in terms of latent variable 
constructs, but for which latent variable modeling can be used to great 
advantage: random coefficients describing individual differences in growth; 
unobserved variables corresponding to missing data; and variance 
components describing data from cluster sampling. An educational 
achievement data set will be described as a motivating example. Using the 
features of the example, it will be shown that all three topics can be simply 
expressed in terms of latent variable modeling which fits into existing and 
generally available structural modeling software. This development makes a 
connection between mainstream statistical methods and work by 
psychometricians and other methodologists interested in latent variable 
modeling. Having put the methodology in a general latent variable context, 
several interesting extensions of the statistical analyses are evident. 

2, A General Latent Variable Framework 

Analysis of latent variable models is most often carried out by minimizing 
the following fitting function 

P 

(1) ]T { N p [ In I S p I + tr ( Sp 1 T p ) - In I Sp I - r ] } N" 1 , 
p = i 

where 

(2) T p = S p + ( y p - )i p ) ( y p - |i p )' . 

1 I thank Ginger Nelson, who provided helpful research assistance. 
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In maximum-likelihood (ML) estimation of conventional structural 
equation models with latent variables, this is the fitting function 
corresponding to independent random samples from P populations with 
sample sizes Np and total sample size N. Here, an r uimensional vector y, say, 
is observed with sample covariance matrix Sp, sample mean vector yp , 
population covariance matrix Zp, and population mean vector Hp, The terms 
containing In I Sp I - r are offsets so that a perfectly fitting model has the 
function value of zero. The sample covariance matrices Sp are the ML 
estimates of the unrestricted Xp matrices and are therefore divided by Np, not 
Np - 1. Multiplying the minimum value for any model by 2 x N then gives the 
value of the likelihood-ratio chi-square test of the Hq model against the Hi 
model of unrestricted mean vectors jip and covariance matrices Zp. Many 
models do not impose any restrictions on |ip in which case the second term on 
the right-hand-side of (2) vanishes and only covariance matrices are involved 
in the estimation. The simultaneous analysis of several populations is 
considered when the populations have parameters in common, so that equality 
constraints of parameters across populations are invoked. 

The specification of latent variable models in terms of and Zp is 
described in several sources (see, e,g M Joreskog, 1977; Muthen, 1983). One 
common framework is as follows. For a certain population a linear 
measurement model for a latent variable vector r| is specified 

(3) y = -o + Ar)+e, 

where u and A contain measurement intercept and loading (slope) 
parameters, respectively, and £ denotes a vector of measurement errors. In 
addition, linear structural equations are specified for x\ 9 

(4) rj = a + Br\ + £ , 

where a and B contain structural regression intercepts and slopes, 
respectively, and t, denotes a vector of residuals, With E(t|) = a, V(e) = 0, V(Q = 
V, usual assumptions give the mean and covariance structure for they vector as 



(5) \x = v + Ad-BWa, 
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(6) Z = A(I - B)" 1 4* (I - B)" 1 ' A' + 6. 

3, A Motivating Example 

The example concerns longitudinal observations on mathematics 
achievement in grades 7-12 collected in the U.S. within the National 
Longitudinal Study of American Youth (LSAY) (Miller, Suehner, HotTer, 
Brown, & Pifer, 1991). Two cohorts were followed, one spanning grades 7-10 
and the other grades 10-12. The mathematics curriculum is quite varied in the 
U.S. and students are likely to show differences in growth as a function of 
differences in background characteristics such as course taking and gender. 
The test measures mathematics skills in a number of subtopics including 
algebra, probability & statistics, geometry, measurement, and arithmetic. 
Topic-specific subtest scores are of interest, but since there is a rather small 
number of items within subtopics, there is a need to allow for measurement 
error in such subscores, for example, by specifying a factor-analytic 
measurement model. 

In order to measure different ability levels, the test items that are 
administered vary across grades and groups of students within grades. The 
various test forms do, however, have many items in common so that the 
various test forms can be equated. Due to the large variation in mathematics 
achievement, an adaptive testing strategy was employed in the LSAY in order 
to avoid floor and ceiling effects and to maximize the information obtained on 
the students' achievement level. Given the performance at the first testing 
occasion, an easy, medium, or hard test form was chosen for the next grade 
with possible test form alterations also in subsequent grades. The test forms 
also differed across grades within difficulty designation. Table 1 shows the 
different groups of individuals in the youngest cohort taking different sets of 
tests. It is seen that the adaptive testing strategy gives rise to certain patterns 
of missing data. Missing data also occurs due to attrition so that not all 
students have observations for all grades. 

As is typical for large-scale educational data, the LSAY data are obtained 
through multi-stage, complex sampling. A key feature is that about 60 
students are randomly sampled within each of about 60 schools. It is well- 
known that assuming simple random sampling when data have in fact been 
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obtained by cluster sampling leads to deflated standard errors of estimates 
(see, e.g., Skinner, Holt, & Smith, 1989). This effect is often described in terms 
of the "design effect" (deff), taken as the ratio of the corresponding variance 
estimates. To illustrate the effect of this cluster sampling feature, intraclass 
correlations v/ere calculated for a set of achievement variables obtained at the 
seventh grade, Testlets corresponding to topic-specific sums of items scored 
right/wrong were used for the following topics (intraclass correlation in 
parenthesis): algebra (.03), probability & statistics (.15), geometry (.12), 
measurement (.12), methods (.05), numbers & operations! (.10), numbers & 
operations 2 (.08), numbers & operationS3 (.09), numbers & operations 4 (.13), 
organization (.09). Several intraclass correlations are larger than .10. Using 
the deff formula for a variance estimate of a mean, 1+ (c-1) p for cluster size c 
and intraclass correlation p (Cochran, 1977, p. 242), gives a sizeable design 
effect of about 7 due to the large cluster size of 60. The intraclass correlations 
may in fact be deflated since the within-school variance is likely to contain a 
large amount of measurement error variance (see Muthen, 1991). 

4. Modeling of Individual Differences in Growth 

For the example discussed in the previous section, consider an 
achievement score for individual i at time point t where t corresponds to the 
different grades (t = 0, 1, T, say), 

(7) yti^-i + frt + Cti 

In (7), ai and (ij are individual-specific parameters describing initial level 
of achievement and rate of learning, while £ represents a residual. The 
characteristic feature of this model is that the regression intercepts and slopes 
are random coefficients that vary over individuals, possibly as a function of 
individual-specific values of a time-invariant covariate z\, 

(8) ^ = a + y a zi + 5 ai 



(9) (3i - [3 + Y p zi 4- 6 pi 
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Here, a and p represent overall values, y's are regression parameters, 
and 8's represent residuals. The residuals for the intercepts and the slopes 
may be correlated so that the growth rate may be related to initial status. As 
an example, z may represent participation in enriched or algebra classes, in 
which case the y's are likely to be positive. The random intercepts a\ and 
random slopes pi may also be estimated for each individual so that an 
individual-specific growth curve can be derived. 

It may be noted that instead of assuming growth that is linear in t, as in 
(7), any function oft may be used, including functions involving parameters to 
be estimated, such as logistic growth and exponential decline. 

The model implies growth in means and variances as a function of t and 

z, 

(10) E ( y t j I z{ ) = a + y a zj + (p + y p 24 )t 
(ID V( yti I zi ) = o a 2+2to ap + t2a p 2 + o 5 2 

The model may be extended by adding a time-varying covariate x^j to the 
growth curve of (7), 

(12) y t i = a* + pi t + y t x t i + Cti 

In the context of the present achievement example, x^i may represent amount 
of course work prior to time point t for individual i. 

The above growth model can be seen as a model with latent variables. As 
is clear from (7)-(9), ol\ and Pi can be viewed as latent variables instead of 
random parameters (Muthen, 1991, 1992). Both aj and pj are unobserved i.i.d. 
variables varying across individuals. Because t does not vary over individuals, 
t can be viewed as a fixed regression parameter for the variable Pj. The model 
fits into the general framework of equations (3M6) letting r\ contain 04 and Pj. 

This type of modeling is an example of the latent curve analysis of Tucker, 
Meredith, McArdle and others (see, e.g., Meredith & Tisak, 1990). The growth 
model imposes restinctions on both the mean vector and the covariance matrix 
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for the observed variables. In this way> both \i and £ of (1) are used in the 
estimation. A single population is used. 

The structural modeling approach to longitudinal data makes for a very 
flexible modeling framework, Multiple indicators can be handled so that 
growth pertains to latent variables without measurement error. In the math 
achievement example, it is reasonable to assume that the testlets measure a 
single factor r|fj. In this case the factor rj ti replaces y^ in (7) and the testlets 
correspond to multiple indicators y tij as in (3), 

(13) y t ij= V, + Aj T|ti + etij > 

j = 1, 2, J, where v is a measurement intercept parameter, X is a 
measurement loading parameter, and e represents measurement error 
assumed to be uncorrected with T) and among themselves. Binary and 
ordered categorical variables can also be handled in this framework (Muthen, 
1983,1992). 

5. Modeling of Missing Data 

For the motivating example discussed in Section 3, Table 1 showed the 
pattern of missing data. The missingness was both by design due to the use of 
adaptive testing and due to attrition. Missing data theory is presented in Little 
and Rubin (1987) and is discussed in the latent variable context by Allison 
(1987) and Muthen, Kaplan, and Hollis (1987). Following Muthen et ah (1987), 
we may modify the measurement model of (3) as 

(14) y* = v + ATj + e 

(15) s* = ry* + 5 

Here, y* and s* are sets of r continuous, latent variables assumed to be 
multivariate normal. The residual vector 5 is possibly correlated with r| and e. 
Using a threshold parameter x\ , each s*jj variable defines a probit regression 
describing the propensity for y*ij to be observed for individual i, 
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(16) yij = 



y ij > lfs ij > x i 



missing, otherwise 



Returning to the missing data example of Table 1, consider the first and 
last missing data patterns. Let the observed test scores in grade 7 be denoted yj 
and the scores of the test sequence E, E, E in grades 8, 9, 10 be denoted y 2 . In 
this way, there is no missingness on yi for either pattern, whereas the last 
pattern has missing data for y 2 . Let y 2 contain p variables, define K{ as 



and let <() denote multivariate normal densities. The likelihood component for a 
sample unit in the last missing data pattern is then obtained by integrating 
over the p latent variables y* 2 in a truncated normal distribution, 



The conditional normal density inside the integrals of (18) depends on the 
specification of the relationship between s* and y* in (14) and (15), Consider 
the case where conditional on y^, s* is independent of y* 2 , so that s* is only 
influenced by y*j in (15), In our example, y*i is observed as yj. Then the 
conditional density in (18) does not involve parameters of the latent variable 
model but only parameters describing how yj predicts the missingness on y* 2 . 
In this case the missing data mechanism is "ignorable" and correct ML 
estimation of the latent variable model is obtained using only the <j> ( y^j ) term 
in (18) corresponding to the data that are not missing. 



(17) Pr(s* 1 < Ti,b? 2 



< T2, .... s ip < T p ) = Ki 




This gives 
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In our example, ignorability for the data that are missing by design holds 
if the test form for a certain grade is indeed only dependent on the performance 
on the test in the previous year. Attrition may be predicted by factors that also 
influence the performance on the tests taken, Missingness by attrition is 
ignorable if conditional on such factors, the values of the missing test scores 
are independent of the values of the observed test scores. 

Again considering the first and last missing data patterns of Table 1, and 
assuming ignorability, (18) suggests that the log likelihood may be written as 

N C 

d9) log l = £ log ♦ ( yii ) + X l0 s 4> ( y2i i yn ) 

i TZ 1 i = 1 

where N is the total number of cases in the two patterns and C is the number of 
individuals that have complete data. The second term on the right hand side of 

(19) contains the regression parameters, while the first term contains the 
parameters of the marginal distribution of yj. As pointed out by Anderson 
(1957), in the case of an unrestricted model the parameters of these two parts 
can be estimated separately and the estimates have closed-form expressions. 
For the case of a latent variable model, the restricted case, a closed-form 
expression does not, however, exist and the advantage of writing the likeMhood 
in the form of (19) disappears. Muthen et al, (1987) instead proposed the use of 
the equivalent form 

C N 

(20) log L = £ log $ ( y,i , y 2i ) + ]T log <j) ( y u ) 

i = 1 i = C + 1 

The two terms of the right hand side of (20) involve two different groups of 
individuals corresponding to the two different patterns, Equation (20) shows 
that the standard multiple-group structural modeling fitting function of (1) 
can be used for the estimation. Under ignorability, a simultaneous analysis of 
the two groups, using different number of observed variables in the two groups 
and across-group equality restrictions on common parameters yields ML 
estimates of the latent variable model parameters. Muthen et al, (1987) 
describe how to set up this analysis using structural modeling programs and 
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show how the model can be tested. The approach may be generalized to involve 
groups corresponding to all the different missing data patterns of Table 1. 

6. Modeling of Multilevel Data 

The final area to be discussed in terms of lament variable modeling is that 
of variance components describing data from cluster sampling. In the math 
achievement example, students were sampled within schools and the 
intraclass correlation coefficients showed that the degree of dependence 
among student observations from the same school was quite large. In order 
for the fitting function of (1) to give proper ML estimates, standard errors of 
estimates, and ehi-square measure of model fit, this deviation from simple 
random sampling needs to be taken into account. Statistical theory for such 
situations is described in Skinner, Holt, and Smith (1989). Recently, 
psychometricians have extended this work to encompass latent variable 
modeling (see, e.g., McDonald & Goldstein, 1989). For an overview, see 
Muthen and Satorra (1989), Muthen (1989) and Muthen and Satorra (1991). In 
this work, parameters are added to those of conventional modeling in order to 
properly describe the variation due to the different stages of cluster sampling. 
This has given rise to the name multilevel modeling (see, e.g., Bock, 1989). 

The following model describes both the school- and student-level variation. 
Letting the index g denote school, we may consider the r-dimensional vector of 
observed scores ygj for individual i and a q-dimensional vector Zg for school g 
as follows. We may assume g = 1, 2, G independently observed groups with 
i = 1, 2, Ng individual observations within group g and arrange the data 
vector for which independent observations are obtained as 

(21) d g '= (z g \ y gl ', y g2 \ y g Ng) > 

where we note that the length of dg varies across groups. The mean vector and 
covariance matrix of dp- are assumed to have the structures 



(22) n dg ' - [ n z \ l Ng ' ® u y ' ] 
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(23) Z dg = 



symmetric 



lNg ® Zy Z 



I Ng &£ W + lNg W ® 



where I^g is an identity matrix of dimension Ng, l^g is a unit vector of length 
Ng and the symbol ® denotes the Kronecker product. 



Assuming multivariate normality of dg leads to the minimization of the 
ML fitting function 



G 



(24) £ {logli:d g l-r(d g ^ dg ) ! Za;(d g -^ g )} 



g=i 



As shown in Muthen (1989, 1990), the expression in (24) may be rewritten 
in a form that both avoids using parameter arrays involving the number of 
observations per group and fits in conventional structural equation models. 
Reducing the summation from G groups to D, corresponding to the number of 
distinct group sizes, the ML fitting function may be written as 



(25) V G d {ln | Ifll + tr [ Idd" 1 ( + N d ( v d - \l ) ( v d - |i )* )] } + 



where d is an index denoting a distinct group size category with group size Nj, 
G<j denotes the mumber of groups of that size, 



D 



d 



+ (N-G){ln |Zw|+tr [ S PW ] } , 



(26) Z dd = 



N d In 



symmetric 
Z w + Nd Z B 



N d E, 



yz 



denotes a between-group matrix 



r 
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(27) S Bd = N d G^ 1 X 



k= 1 



Zdk- Zd 

ydk - yd 



[ ( z d k - z d )' ( y<ik - yd y] 



(28) v d -Ji = 



yd-^y 



with Zd and yd representing the sample mean vectors in group category d, and 
Sp\y is defined as the usual pooled-within sample covariance matrix 

G N * 

(29) Spw = (n-g)- 1 £ X (> ; si^yg)(ygi-yg)'- 

g-l i=l 

On comparison with (1) it is seen that (25) may be viewed as an analysis of D+l 
populations with certain parameter equality constraints across populations. 

ML estimation by optimization of (25) is, however, cumbersome with 
many different group sizes, both in terms of computational work and in terms 
of input specifications for the software. Muthen (1990) proposed a simpler, ad 
hoc estimator which gives results close to those of ML, using the fitting 
function 



(30) G { In 



c £/. z symmeiric 



+ tr 



cEzz symmetric 
c Z yz Xw + c £b 



Sb ) + 



4- ( N - G ) { In I Z w I + tr [ Ztf S PW ] } , 



where 

(31) 



Sb = (G - 1 ) 0 



cI(Zs-z)(z g -z)' 



symmetric 



cx G/N £ N s ( y g - y ) ( z s - z)' ' £ N g ( y g - y ) ( y 8 - y )' 
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G 

(32) c - [ N 2 - £ N 2 ] [ N ( G - 1 ) J" 1 

and Sp\y is as before. On comparison with (1) it is seen that (30) corresponds to 
an analysis with two populations, one for the between part and one for the 
within part. 

For the math achievement test scores of y, a latent variable structure such 
as in (6) may be formulated for Z B and Z^y, not neccessarily using the same 
structure. Muthen (1990) discusses different types of models that may be of 
interest. The within structure of would still use a single-factor model 
since it pertains to the student-level structure. The between structure Zg 
describes across-school variation in math achievement and it is harder to 
postulate an a priori model for this variation. Experience has shown, 
however, that a single-factor model often captures the covariation in Z B quite 
well. The school-level variables z g may be exemplified by indicators of whether 
or not the school "tracks" the 7th- and 8th-grade math programs. Muthen 
(1990) gives an example of a latent variable model with Zg variables influencing 
the between-part of the y variation, 

7. Discussion 

A thorough analysis of the math achievement example of Section 3 calls 
for the use of modeling with random coefficients describing individual 
differences in growth, unobserved variables corresponding to missing data, 
and variance components describing data from cluster sampling. The 
previous three sections have described how each of these modeling features 
may be approached in a general latent variable context using existing 
structural equation software. The fitting function of (1) is used in all cases, 
either in one or in several populations using covariance matrix structures and 
possibly also mean vector structures. In an actual analysis of this data set, the 
three approaches need to be combined. This analysis will not be carried out 
here, but it is clear that the use of the fitting function of (1) accomplishes also 
this complex task. 

This paper has made connections between mainstream multivariate 
statistics and work by psychometric! ans and other methodologists interested in 
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latent variable modeling. Viewing the methodology from a general latent 
variable perspective, points to several interesting extensions of the statistical 
analyses. 
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